Readme: Description of Data Sets and Do Files for the Accidental Deaths and JSL Reform Paper

by Dan Carvell (daniel.n.carvell@gmail.com, CARVELLD@cna.org and dnc2101@columbia.edu), June 2011

This file describes the data sets and do files posted on Dataverse for the paper "Accidental Death and the Rule of Joint and Several Liability".  

This paper estimates the effects of JSL reforms and other tort reforms that have been enacted by state governments on the rate of accidental deaths in states, using
state-year level data on the presence of tort reforms, accidental death rates and controls.  As such, the main two datasets for this project consist of state-year
level data compiled from various sources into a single "raw" dataset, which is then turned into a "cleaned" dataset with all of the variables and data points for the
regressions.  These datasets have been posted here on Dataverse, along with a spreadsheet full of textual background legal information on state tort reforms that underlies
the numerical information on state tort reforms in the data for the regressions.  Also posted here are the STATA do files that were used to clean the data and run the
regressions.  The main do files of interest are the do file in the "do_files/clean_data/clean_state_year_data" sub-subfolder and the do file in the 
"do_files/run_regressions" subfolder.  These clean the raw state-year data to get it ready for the regressions to be run and run the regressions, respectively.  The 
other do files in the "do_files/clean_data" subfolder were used to take Census Bureau data and National Center for Health Statistics Vital Statistics Mortality Files data
and collapse it to the state-year level for this project.  The state-year Census and Vital Statistics data that we generated was placed into the "raw" state-year level
dataset along with other state-year level variables that we collected.  





Brief Description of All of the Files


The folder "do_files" contains the do files for this paper, that clean the data and run the regressions.

The subfolder "do_files/clean_data" contains the do files that clean the raw data for this paper, to get the data in the right format to run the regressions, and
so that all of the variables for the regressions have been created and are included in the data.  

The sub-subfolder "do_files/clean_data/clean_Census_data" contains do files that are used to create state-year level counts of the number of state residents in different 
demographic groups from raw Census data from the Census Bureau website.  These state-year level counts are used to create control variables for the regressions for 
this paper, as well as denominators for the accidental death rate variables that are the dependent variables in this paper.

The sub-subfolder "do_files/clean_data/clean_Vital_Stats_data" contains do files that are used to create state-year level counts of the number of accidental deaths 
of different types that occur within each state in each year of the data.  These do files take raw individual-death level data from the raw Vital Statistics Mortality Files data, clean
the data, and collapse it to the state-year level.  These counts of numbers of accidental deaths of different types are used as numerators in many of the accidental death 
rate variables that are the dependent variables in this paper.  (Note:  a few of the accidental death rate variables for this paper come from data downloaded from 
the Center for Disease Control's WISQARS website, http://www.cdc.gov/injury/wisqars/fatal.html.  This website provides state-year level data on rates of many types of 
accidental deaths, that the Center for Disease Control constructed from the Vital Statistics Mortality Files data and Census Bureau Data.  The WISQARS death rate data
was used as an initial source of data for the first set of regressions that were run for this project, since it allowed me to obtain accidental death rate data very quickly, 
because it allowed me to skip all of the steps involved in constructing rates from the Vital Statistics and Census data myself.)

The sub-subfolder "do_files/clean_data/clean_state_year_data" contains a do file that takes the raw state-year data for this project and cleans it and creates new 
variables with it in order to create the dataset that is used to run the regressions for this project.  The raw state-year level data for this project contains data
on the presence of tort reforms within each state in each year, that we entered by hand into a spreadsheet in order to code up textual information about tort reforms
that is contained in the database of state tort reform information that we created for this project.  The raw state-year level data for this project also contains
some state-year level control data that was downloaded for this project and copied and pasted into this spreadsheet, as well as data on state-year counts of the number 
of state residents in different demographic groups that was created by the do files in the "do_files/clean_data/clean_Census_data" sub-subfolder, and data on state-year 
level counts of the number of accidental deaths of different types that occur within each state that was created by the do files in the 
"do_files/clean_data/clean_Vital_Stats_data" sub-subfolder.  

The subfolder "do_files/run_regressions" contains the do file that runs the regressions for this paper, and creates output tables with the coefficients and standard
errors on the variables of interest in the regressions, as well as the R-squareds from the regressions.  




The folder "data" contains the datasets for this project.

The sub-folder "data/cleaned_data_for_regressions" contains the dataset used to run the regressions.  This dataset has all of the variables
used in the regressions.

The sub-folder "data/raw_state_year_data" has the raw state-year level data for this project.  The state-year data comes from multiple sources.  
Some of the data come from Census Bureau and Vital Statistics Mortality Files data that was originally not at the state-year level, but that 
was cleaned and then collapsed to the state-year level for this project.  However, this state-year level dataset is "raw" in the sense that
it still requires further cleaning and new variables need to be generated from this data in order to run the regressions for this paper.  This dataset
is posted in both XML Excel format and STATA format.

The sub_folder "data/state_tort_reform_information_spreadsheet" contains a file with detailed textual information on the presence of tort reforms
of different types in each state in each year of the data, including citations of the statutes and tort cases that created, struck down and repealed
each tort reform and the exact dates at which these law changes occurred.  These textual information was used to create the numerical variables 
on the presence of tort reforms within each state at different times that was entered into the raw state-year level dataset for this paper.  











More Detailed Description of the do files that clean the raw Census Bureau and Vital Statistics data for this paper, and that give links to the websites where
this raw data can be downloaded.    


Census Bureau data:  
The "do_files/clean_data/clean_Census_data" sub-subfolder contains programs that take raw Census data on the number 
of persons in different demographic groups in each state in each year, cleans it, and generates the state-year
level demographic variables that are used for two purposes in this paper.  Some of these variables are used in 
generating the state-year demographic control variables included in the regressions for this paper.  Other 
variables are used in making the denominators for the dependent variables that examine the effects of tort 
reforms on accidental death rates within specific age groups - the denominators for the dependent variables in 
Table 6 of the paper.  The do file "create_age_race_gender.do" in this sub-subfolder creates 
variables used for the former of these purposes, and the do file "create_age_race_gender_Part_2.do" creates 
variables used for the latter of these purposes.  The exact Census Bureau websites from which this raw Census data
can be downloaded are listed in the introductory notes in the beginnings of each of these two do files.  

 
Vital Statistics data:
The sub-subfolder "do_files/clean_data/clean_Vital_Stats_data" contains programs for loading, cleaning, and generating 
variables with the Vital Statistics Mortality Files.  The "Vital Stats" data are the basic source for many of the dependent 
variables in this paper.  Other data on accidental death rates were also downloaded from the Center for Disease Control's 
WISQARS Fatal Injuries:  Mortality Reports website at http://webappa.cdc.gov/sasweb/ncipc/mortrate.html .  The WISQARS data were 
downloaded in Summer 2007, and data from WISQARS was entered into the Excel XML file "Raw_Data_Deaths_XML" and then
imported into Stata.  The data from WISQARS does come from the Vital Stats records, and was used at the very
beginning of this project because these are the data on accidental death rates used in Rubin and Shepherd Journal of Law and 
Economics (2007), and because it was quick and easy to do the initial regressions for this project using these data rather than
working directly with the raw Vital Stats data myself.

The do files in the sub-sub-subfolder "do_files/clean_data/clean_Vital_Stats_data/Load_and_Clean_Raw_Vital_Stats"
do the following tasks.  First, they import the raw individual-death level Vital Stats Mortality data for each year from 
1981 to 2004 into STATA.  (There are seperate raw data files for each seperate year of the data, so multiple do files,
one for each year, are needed for this task.  The raw data for each year, as well as some do and dct files
for this task, was downloaded from the NBER website at 
http://www.nber.org/data/vital-statistics-mortality-data-multiple-cause-of-death.html in 2007.)  The do files then add 
variable labels to this data.  Since the data from all of these different years 
needs to be appended together, but appending data on every death in every year from 1981 to 2004 would produce 
an extremely large, cumbersome dataset, the do files create datasets for each year of the data that just 
contain data on injury deaths (accidents, murders, suicides, and a few injury deaths where intent was 
unknown).  They then append the data on injury deaths from each year together.
The end result of running all of the do files in this folder is to produce the datasets "Injury_Deaths_81_98.dta" 
and "Injury_Deaths_99_04.dta".

The do files in "do_files/clean_data/clean_Vital_Stats_data/Load_and_Clean_Raw_Vital_Stats" should be run in the following order:
1)  load_mort1981.do
2)  clean_mort_1981.do
3)  load_mort1982.do through load_mort2004.do
4)  make_state_fips_03_04.dta
5)  append_ICD_9_years.do and append_ICD_10_years.do

make_state_fips_03_04.do needs to be run, so that code for state FIPS variables gets added into the Vital 
Stats data from 2003 and 2004.  Data on state FIPS is something that's needed for merges, collapsing to state-
year level cells, and running regressions.  

The dataset "Injury_Deaths_81_98.dta"
contains data on all injury deaths that occurred in the US from 1981 to 1998, years in which the 
International Classification of Disease Version 9, or ICD-9, was used to classify the cause of each death.
The dataset "Injury_Deaths_99_04.dta" contains data on all injury deaths that occurred in the US from 1999 
to 2004, years in which the International Classification of Disease Version 10, or ICD-10, was used to 
classify the cause of each death.  There are enough differences between the datasets for these two sets of 
years to preclude appending the data from these two sets of years together at this point.

The do files in the sub-sub-subfolder "do_files/clean_data/clean_Vital_Stats_data/Manipulate_Cleaned_Vital_Stats" 
take the datasets "Injury_Deaths_81_98.dta" and "Injury_Deaths_99_04.dta" and
generate state-year counts of different types of accidental deaths, that are used as the numerators in 
constructing many of the dependent variables used in the regressions for the paper.  The do files 
make_age_group_count_vars.do and make_age_group_count_vars_99_04.do create state-year counts of accidental
deaths, other than auto accidents and drug overdoses, that occur amongst members of different demographic
groups - these are the numerators of the dependent variables used in the regressions shown in Table 6 of the 
paper.  make_drug_OD_vars.do and make_drug_OD_vars_ICD_10.do create counts of the number of fatal overdoses
on illegal drugs and abused pharmaceuticals that occur in each state in each year, the numerator used in 
constructing the rate of overdoses.  make_where_pronounced_dead_vars.do creates the state-year counts of 
accidental deaths, other than auto accidents and drug overdoses, that occur for persons pronounced dead
outside of hospitals. 




More detailed description of the raw state-year level dataset for this paper:

The subfolder /data/raw_state_year_data contains raw data for this project, in both XML / Excel format and 
STATA format.  The XML file contains raw data on the presence of tort reforms within states in different years,
which was originally coded up in Excel.  It also contains data on accidental death rates that was 
downloaded from the Center for Disease Control's WISQARS Fatal Injuries: Mortality Reports website at 
http://webappa.cdc.gov/sasweb/ncipc/mortrate.html .  The WISQARS data was downloaded in Summer 2007, and data 
from WISQARS was entered into the XML file.  The data from WISQARS do come from the Vital Stats 
records, and was used at the start of this project because this is the data on accidental death rates used by 
Rubin and Shepherd (2007), and because it was quick and easy to do the initial regressions for this project 
using this data rather than working directly with the raw Vital Stats data myself.  Finally, the raw dataset
contains variables generated from the raw Census and Vital Stats Mortality data.  Those data were placed into 
this dataset, so that all of it is in one dataset that can be opened up by the do file 
do_files/clean_data/clean_state_year_data/clean_data.do that generates the variables used in the regressions.  
In addition, those data were placed into this dataset because I also wanted to browse through the variables I 
generated with the raw Census and Vital Stats data to check for mistakes and inconsistencies, which is easier to 
do in Excel than in STATA. 




















